- For Course materials: https://github.com/gl2668/R_For_Data_Science
- Me
- Urban Planner & Data Analyst
- Contact me via email: gl2668@columbia.edu
- You
- Roundtable of names, interests, and background
- Why do you want to learn R?
May 26, 2020
Tidyversedplyrggplot2purrrpsych and base Rcaretstringr and rebus packagestidytext, tm and wordcloudrvesthttrshinymonospaced (like code) by surrounding it in backticks:*italics*, **bold**, `code`
[Columbia U](www.columbia.edu)
To create titles and headers, use leading hastags. The number of hashtags determines the header’s level:
# First level header
## Second level header
### Third level header
To make a bulleted list in Markdown, place each item on a new line after an asterisk and a space, like this:
* item 1
* item 2
* item 3
You can make an ordered list by placing each item on a new line after a number followed by a period followed by a space.
1. item 1
2. item 2
3. item 3
You can also use the Markdown syntax to embed latex math equations into your reports. To embed an equation in its own centered equation block, surround the equation with two pairs of dollar signs like this,
$$1 + 1 = 2$$
To embed an equation inline, surround it with a single pair of dollar signs, like this: $1 + 1 = 2$
All standard Latex symbols work.
R code can be included as chunk with
```{r} ```
or inline with a single tickmark.
R functions sometimes return messages, warnings, and even error messages. By default, R Markdown will include these messages in your report. You can use the message, warning and error options to prevent R Markdown from displaying these.
Keyboard Shortcut to create a new chunk is command + option + I
Three of the most popular chunk options are echo, eval and results.
If echo = FALSE, R Markdown will not display the code in the final document (but it will still run the code and display its results unless told otherwise).
If eval = FALSE, R Markdown will not run the code or include its results, (but it will still display the code unless told otherwise).
If results = 'hide', R Markdown will not display the results of the code (but it will still run the code and display the code itself unless told otherwise).
knitr is an engine for dynamic report generation with R and is used to convert (or “knit”) R Markdown files into the desired output format.
#install.packages("dplyr")
library(dplyr)
#install.packages("pacman")
library(pacman)
p_load(dplyr, ggplot2)
ggplot2 (graphics)tibble (data frames and tables)tidyr (make tidy)readr (read in tabular formats)purrr (functional programming)dplyr (manipulate data)tidyverse (All the above)mtcarsdata() in the consolehead(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Using the data.table package to read files
p_load(data.table)
flights <- fread("https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv")
# Check working directory getwd()
## [1] "/Users/geraldlee/Documents/Intro to R"
# Set working directory
setwd('/Users/geraldlee/Documents/Intro to R')
# Using the readxl package to read in Excel files
library(readxl)
rawData <- read_excel(path = "data/data_example1.xlsx", # Path to file
sheet = 2, # We want the second sheet
skip = 1, # Skip the first row
na = "NA") # Missing characters are "NA"
# Or fread
rawData <- fread("data/data_example1.xlsx")
head(flights) # head() / tail() to show 5 top/bottom rows
## year month day dep_delay arr_delay carrier origin dest air_time distance ## 1: 2014 1 1 14 13 AA JFK LAX 359 2475 ## 2: 2014 1 1 -3 13 AA JFK LAX 363 2475 ## 3: 2014 1 1 2 9 AA JFK LAX 351 2475 ## 4: 2014 1 1 -8 -26 AA LGA PBI 157 1035 ## 5: 2014 1 1 2 1 AA JFK LAX 350 2475 ## 6: 2014 1 1 4 0 AA EWR LAX 339 2454 ## hour ## 1: 9 ## 2: 11 ## 3: 19 ## 4: 7 ## 5: 13 ## 6: 18
dim(flights) # Get the shape of the data
## [1] 253316 11
colnames(flights) # Get the column names
## [1] "year" "month" "day" "dep_delay" "arr_delay" "carrier" ## [7] "origin" "dest" "air_time" "distance" "hour"
?ggplot2
help(dplyr)